Class Project: Analysis of the S&P 500 to show historical trends on when to make the most profit¶

David Meduga S&DS 123

Introduction¶

The "The S&P500 is a market-capitalization-weighted index of 500 leading publicly traded companies in the U.S. The index actually has 503 components because three of them have two share classes listed."(investopedia) The S&P500 is a leading indicator in the finanical world that tells the world about how the economy is doing and the financial wellbeing of the top 500 companies. The object of this project is to make a analysis of the S&P500 and based on this data show investing strageties where it was most profitable to make a trade and either buy long or sell short. In my Final Project, I will be coding the adjusted returns from the S&P 500, taking the growth rate of specific stocks, use the top 9 stocks that makes up a majority of the S&P500 to show there growth over time, and using a basic investment strategy to show when it was the time to optimally purchase or sell a indidual stock in the S&P500.

The project focus is on the Financial world because the financial world encompases everything about the economy and a impact in the stock market is usually due to a world problem that causes a serious problem in the economy. Being able to track the stock market and see the changes in real time is staying ahead of the game and having this information is very valuable. This information leads to profits, real world impacts, and innovation. This is why I am doing a analysis on the S&P500 to see the historical trends and make trading decisions based on previous data.

Click on this github page to interact the interactive graphs https://phonixfire01.github.io/YDATA_PROJECT/Final_Project.html

Where you got the data from, including a link to the website where you got the data if applicable.¶

I obtained the data from the python package yfinance which is from yahoo finance. I used the link https://en.wikipedia.org/wiki/List_of_S%26P_500_companies to get the list of the S&P500 companies and there Ticker Symbols.

What other analyses have already been done with the data, and possibly links to other analyses.¶

The other analyses that have been done with the S&P 500 are numerous. Here is a link to github https://github.com/topics/financial-analysis?l=python for 314 public records for others that have done similar analysis. There have been many other analysis of the the S&P500 ranging from corporations such as J.P. Morgan, Citidel, DRW Trading group. Each group has there own unique way of analysing the data and comes up with different conclusions on the most profitable trading strageties. They each use different techinques to bring about different amounts of profit and have a ranging varity of teams decitated to research the next stragety.

Data Wrangling¶

I am using adj_close = adjusted close of the stock, which is the closing price after adjustments for all applicable splits and dividend distributions, df= dataframe of the downloaded data from yfinance, fig = showing interavtive plots, top9stocks= top 9 stock in the S&P500, and rets = return rate, which is the net gain or loss of an investment over a specified time period, expressed as a percentage of the investment's initial cost. All other equations functions are explained as needed above the code.

Step 1 Downloading Yahoo Finance and Other Packages¶

In [1]:
#run the pip install if you haven't already installed the data
#pip install yfinance
import pandas as pd
import yfinance as yf
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import plotly.express as px
import plotly.graph_objects as go 
from datetime import datetime
%matplotlib inline
import pandas_datareader as web

Step 2 Web Scraping from Wikipedia to see all companies inside the S&P500¶

In [2]:
#The Wikipedia URL of the S&P 500
sp_wiki_url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

# Reading the HTML table into a list of DataFrames
sp_wiki_df_list = pd.read_html(sp_wiki_url)

# Only getting the information needed containing the company information
sp_df = sp_wiki_df_list[0]

# Extract the ticker symbol columns
sp_ticker_list = list(sp_df['Symbol'].values)
In [3]:
#now creating a dataframe from yahoo finance to extract the rest of the stocks data over a 44 year period 
df = yf.download(sp_ticker_list, start='1980-01-01', end='2024-01-01')
[*********************100%%**********************]  503 of 503 completed

4 Failed downloads:
['BF.B']: Exception('%ticker%: No price data found, symbol may be delisted (1d 1980-01-01 -> 2024-01-01)')
['SOLV', 'GEV']: Exception("%ticker%: Data doesn't exist for startDate = 315550800, endDate = 1704085200")
['BRK.B']: Exception('%ticker%: No timezone found, symbol may be delisted')
In [4]:
#Getting the Adjusted close for each Stock 
adj_close = df['Adj Close']
In [5]:
#putting it into a colum 
adj_close.columns
Out[5]:
Index(['A', 'AAL', 'AAPL', 'ABBV', 'ABNB', 'ABT', 'ACGL', 'ACN', 'ADBE', 'ADI',
       ...
       'WTW', 'WY', 'WYNN', 'XEL', 'XOM', 'XYL', 'YUM', 'ZBH', 'ZBRA', 'ZTS'],
      dtype='object', name='Ticker', length=503)
In [6]:
#pulling the S&P500 from yahoo finance and then using an interactive plot to show the adj close over time 
sp_ticker_list = ['^GSPC']
df = yf.download(sp_ticker_list, start='1980-01-01', end='2024-01-01')
#An interactive plot using Plotly
fig = px.line(df, x=df.index, y='Adj Close', title='S&P500 Adjusted Close Price')
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Price',
    height=800,
    width=1200)
fig.show()
[*********************100%%**********************]  1 of 1 completed

This graph illustrates the historical performance of the S&P 500 index over time, reflecting its fluctuations in value across a 40 year period. Understanding why these movements is crucial for finding the right time to invest and maximize returns from the S&P 500. The graph highlights periods of both growth and decline in the index's value, offering insights into the factors driving these fluctuations.¶

To explore the events that makes these fluctuations exist, let's explore the reasons contributing to the rises and dips observed in the graph. By analyzing factors influencing the financial markets, we can better comprehend the underlying drivers shaping the performance of the S&P 500.¶

Real World Impacts on the S&P500¶

In [7]:
df = yf.download("^GSPC", start = '1980-01-01')
[*********************100%%**********************]  1 of 1 completed
In [8]:
#Historical Financial crashes
important_dates = {
    'The Oil Crisis': '1982-04-29',
    'The Tech Bubble': '2000-09-11',
    'Financial Crisis and Great Recession': '2007-10-12',
    'Covid 19 Pandemic': '2020-03-20',
}
In [9]:
#using the interactive graph from earlier and putting a line when the financial Crashes happened
fig = go.Figure(data = [go.Candlestick(
    x = df.index,
    open = df['Open'],
    close = df['Close'],
    high = df['High'],
    low =df['Low'],)])
fig.update_layout(
    title= "S&P500 Shocks", 
    yaxis_title= "S&P500 Stock",
    shapes = [
        dict(x0 = '1982-04-29', x1= '1982-04-29', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2), 
        dict(x0 = '2000-09-11', x1= '2000-09-11', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2),
        dict(x0 = '2007-10-12', x1= '2007-10-12', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2),
        dict(x0 = '2020-03-20', x1= '2020-03-20', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2)], 
    annotations = [
        dict(x= '1982-04-29', y = 1.1 , xref ='x', yref = 'paper', showarrow = False, 
             xanchor = 'left', text = 'The Oil Crisis'),
        dict(x= '2000-09-11', y = 1.1, xref ='x', yref = 'paper', showarrow = False, 
             xanchor = 'left', text = 'The Tech Bubble'),
        dict(x= '2007-10-12', y = 1.1, xref ='x', yref = 'paper', showarrow = False, 
             xanchor = 'left', text = 'Financial Crisis and Great Recession'),
        dict(x= '2020-03-20', y = 1.1, xref ='x', yref = 'paper', showarrow = False, 
             xanchor = 'left', text = 'Covid 19 Pandemic')])
fig.update_layout(xaxis_rangeslider_visible= False)

This graph provides a historical perspective on how global events have profoundly influenced the performance of the S&P 500. It vividly depicts how the index responds during periods of recession or economic downturns, often experiencing downward shifts that persist for extended durations.¶

By studying the graph, investors can discern crucial insights into when to strategically enter or exit the market, aligning their investment choices with economic conditions. It marks the importance of conducting thorough analysis and considering broader macroeconomic trends when navigating the complexities of investing in the financial markets.¶

Now Let's view the top stocks that make up a large portion of the S&P500¶

In [25]:
top9stocks = dict(
    AAPL = "Apple stock",  AMZN = 'Amazon Stock',  NVDA = 'NVIDIA', GOOGL = 'Alphabet Class A',TSLA = 'Tesla', GOOG =  'Alphabet Class C',
    META = 'Meta Platforms Class A',  MSFT = 'Miscroft Stock ', UNH = 'United Health Group')
In [26]:
list(top9stocks.keys())
Out[26]:
['AAPL', 'AMZN', 'NVDA', 'GOOGL', 'TSLA', 'GOOG', 'META', 'MSFT', 'UNH']
In [27]:
df_top9 = yf.download(list(top9stocks.keys()))
[*********************100%%**********************]  9 of 9 completed
In [28]:
adj_close_top9 = df_top9['Adj Close']
adj_close_top9 = adj_close_top9.dropna()

Top 9 Stocks Adjusted Close Over Time¶

In [29]:
adj_close_top9.dropna().plot(subplots = True, figsize = (14,7));
No description has been provided for this image

This graph illustrates the average increase in stock value over the past decade, providing insight into how top-performing stocks generally track the movements of the S&P 500 index.¶

Now Let's clean up the data¶

In [30]:
#A clean version of all the stock and there respected name
for key, value in top9stocks.items():
    print(f"{key:8s} | {value}")
AAPL     | Apple stock
AMZN     | Amazon Stock
NVDA     | NVIDIA
GOOGL    | Alphabet Class A
TSLA     | Tesla
GOOG     | Alphabet Class C
META     | Meta Platforms Class A
MSFT     | Miscroft Stock 
UNH      | United Health Group
In [31]:
#rounding to the scond digit
adj_close_top9.describe().round(2)
Out[31]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
count 3007.00 3007.00 3007.00 3007.00 3007.00 3007.00 3007.00 3007.00 3007.00
mean 70.65 77.25 64.60 64.67 165.91 135.72 105.68 84.64 237.21
std 58.65 55.21 40.63 40.04 102.71 111.93 159.98 105.32 159.89
min 11.98 10.41 13.92 13.99 17.71 21.51 2.61 1.74 42.57
25% 24.16 21.35 29.24 29.73 82.06 40.38 5.27 14.35 100.45
50% 41.17 77.85 53.50 53.85 159.25 90.59 43.06 20.20 212.45
75% 129.51 123.54 96.09 95.64 212.84 234.75 138.29 180.94 380.82
max 197.86 189.05 173.69 171.95 527.34 429.37 950.02 409.97 548.93
In [32]:
#cleaning all the data to show only what we want to see
adj_close_top9.aggregate(['min', 'mean', 'std', 'median', 'max']).round(2)
Out[32]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
min 11.98 10.41 13.92 13.99 17.71 21.51 2.61 1.74 42.57
mean 70.65 77.25 64.60 64.67 165.91 135.72 105.68 84.64 237.21
std 58.65 55.21 40.63 40.04 102.71 111.93 159.98 105.32 159.89
median 41.17 77.85 53.50 53.85 159.25 90.59 43.06 20.20 212.45
max 197.86 189.05 173.69 171.95 527.34 429.37 950.02 409.97 548.93

Return Rate Graph for Top 9 Stock in the S&P500¶

In [33]:
#each graph indidually
rets.cumsum().apply(np.exp).plot(subplots = True, figsize = (14,7));
No description has been provided for this image

This visualizes the average returns of the stock over the past decade, offering valuable insights into the profitability of investing in these stocks compared to alternative options.¶

Now lets look at the data by week and year¶

In [34]:
adj_close_top9.resample('1w', label = 'right').last().head()
Out[34]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
Date
2012-05-20 16.036407 10.6925 14.953949 15.025025 38.189480 23.528460 2.770253 1.837333 44.901157
2012-05-27 17.001230 10.6445 14.733027 14.803053 31.876179 23.359650 2.843636 1.987333 46.672581
2012-06-03 16.961918 10.4110 14.221195 14.288789 27.690619 22.869312 2.747319 1.876667 45.774391
2012-06-10 17.546381 10.9240 14.457061 14.525776 27.071278 23.833920 2.779426 2.005333 48.236095
2012-06-17 17.359217 10.9175 14.060049 14.126877 29.978193 24.131342 2.818411 1.994000 49.165634
In [35]:
# Resampling the data to every month the data will show
adj_close_top9.resample('1m', label = 'right').first().head()
Out[35]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
Date
2012-05-31 16.036407 10.6925 14.953949 15.025025 38.189480 23.528460 2.770253 1.837333 44.901157
2012-06-30 16.961918 10.4110 14.221195 14.288789 27.690619 22.869312 2.747319 1.876667 45.774391
2012-07-31 17.915251 11.4660 14.457560 14.526276 30.737387 24.565416 3.084429 2.026667 46.961945
2012-08-31 18.347324 11.6045 15.757935 15.832833 20.857868 23.640997 3.070669 1.750000 42.746559
2012-09-30 20.495808 12.3940 16.962421 17.043043 17.711208 24.590597 3.045444 1.876000 45.542912
In [36]:
# Newly resampled graph to show every month return rate
rets.cumsum().apply(np.exp).resample("1m", label = "right").last().plot(figsize = (10,5));
No description has been provided for this image

This graph demonstrates the significance of timing investments across various months and years, highlighting how exponential stock returns can be affected by strategic entry points.¶

Let's dive into an analysis into Amazon Stock¶

In [39]:
#dropping the NA's
df = yf.download(list(top9stocks.keys()))
df = df['Open'].dropna()
[*********************100%%**********************]  9 of 9 completed
In [42]:
#The most recent data
df.tail()
Out[42]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
Date
2024-04-25 169.529999 169.679993 153.360001 151.330002 421.399994 394.029999 788.679993 158.960007 488.959991
2024-04-26 169.880005 177.800003 175.990005 174.369995 441.459991 412.170013 838.179993 168.850006 492.000000
2024-04-29 173.369995 182.750000 170.770004 169.059998 439.559998 405.250000 875.950012 188.419998 495.709991
2024-04-30 173.330002 181.089996 167.380005 165.610001 431.049988 401.489990 872.400024 186.979996 488.959991
2024-05-01 169.580002 181.639999 166.179993 164.300003 428.600006 392.609985 850.770020 182.000000 479.260010
In [43]:
#now let look at Amazon Stock
symbol = 'AMZN'
window = 20 

df['min'] = df[symbol].rolling(window=window).min()
df['mean'] = df[symbol].rolling(window=window).mean()
df['std'] = df[symbol].rolling(window=window).std()
df['median'] = df[symbol].rolling(window=window).median()
df['max'] = df[symbol].rolling(window=window).max()
df['ewma'] = df[symbol].ewm(halflife= .5, min_periods = window).mean()

Amazon stock price change over a year¶

In [44]:
ax = df[['min', 'mean', 'max']].iloc[-200:].plot(
    figsize = (10,6), style = ['g--', 'r--', 'g--'], 
    lw=.6
)
df[symbol].iloc[-200:].plot(ax=ax, lw= 2.0);
No description has been provided for this image

Over the past year, Amazon's stock witnessed exponential growth, and this graph illustrates the average rate of growth during this period, indicating a positive increase in value¶

Let's try a basic Techinical Analysis Using Simple Moving Average¶

In [48]:
#Using rolling Statistics
df['SMA1'] = df[symbol].rolling(window=42).mean()
df['SMA2'] = df[symbol].rolling(window=252).mean()
In [49]:
df[[symbol, 'SMA1', 'SMA2']].tail()
Out[49]:
Ticker AMZN SMA1 SMA2
Date
2024-04-25 169.679993 178.930238 144.104286
2024-04-26 177.800003 179.018809 144.393016
2024-04-29 182.750000 179.264285 144.689008
2024-04-30 181.089996 179.456666 144.980119
2024-05-01 181.639999 179.573095 145.284445
In [50]:
df.dropna(inplace = True)
In [51]:
#this is 1 is holding long and 2 is short
df['positions'] = np.where(df['SMA1'] > df['SMA2'], 1, -1)

Ploting Simple Moving Average¶

In [52]:
average_ax= df[[symbol, 'SMA1', 'SMA2', 'positions']].plot(figsize= (10,6),
        secondary_y = 'positions')
average_ax.get_legend().set_bbox_to_anchor((0.25, 0.85))
No description has been provided for this image
In [53]:
#now let's test the investment stragety 
import yfinance as yf

def compound_value(investment, symbol, start_date, end_date):
    stock_data = yf.download(symbol, start=start_date, end=end_date)
    daily_returns = stock_data['Close'].pct_change()
    compounded_returns = (1 + daily_returns).cumprod()
    compounded_value = investment * compounded_returns.iloc[-1]
    return round(compounded_value, 2)

#Example:
investment = 100
symbol = 'AMZN'
start_date = '2015-01-01'
end_date = '2019-01-01'
compound_value = compound_value(investment, symbol, start_date, end_date)
print(f'The compound value of the investment is: {compound_value}')
[*********************100%%**********************]  1 of 1 completed
The compound value of the investment is: 486.83

Wow, the possibility of achieving a 487% gain in just four years through market investments is truly amazing.¶

Let's Compare Two Individual Stocks¶

In [54]:
# Two Stocks competing in the same industry
stock1 = 'AAPL'
stock2 = 'MSFT'
In [55]:
 df = yf.download([stock1, stock2])
[*********************100%%**********************]  2 of 2 completed
In [56]:
df.dropna(inplace=True)

Graph of Apple and Miscroft Stock rise Over Time¶

In [57]:
df = df['Adj Close']
In [58]:
# Comparision of the stocks on the same graph
df.loc['2010':'2019'].plot(secondary_y=stock2, figsize= (10,6));
No description has been provided for this image

How Correlated are these two stocks?¶

In [89]:
rets.corr()
Out[89]:
Ticker AAPL MSFT
Ticker
AAPL 1.000000 0.458317
MSFT 0.458317 1.000000
In [92]:
#where the data from above shows on a graph with flucations
ax = rets[stock1].rolling(window=252).corr(rets[stock2]).plot(figsize=(10,6))
ax.axhline(rets.corr().iloc[0,1], c = 'r');
No description has been provided for this image

Wow, I didn't think think Miscroft and Apple would be correlated by that much but goes to show how similiar two companies in the same industry can be toward one another¶

In [64]:
import plotly.io as pio
pio.renderers.keys()
import plotly.io as pio
pio.renderers.default = 'jupyterlab'
pio.renderers.default = 'notebook' 

Conclusion¶

In conclusion, the finding of doing the analysis of the S&P500 include: How different stocks are corelated with one another, how less frequent simple trading strategies actually trades, and how indidual stock moves in a general direction. From this analysis, I know how to use the package yfinance in python and do a deep dive into a stock analysis getting the returns and growth rate over time. This is very helpful in insight into investing in any stock. This code is very verstile is discovering a trend among stocks and can compare stocks to each other to find their corelation rate to diversify their portfolio. I also learned that every substantial dip/loss when analyzing the S&P500 is usually due a substantial world diaster or real world economy problem. Knowing this information being able to stay informed throughout news channels and articles will help me better understand the stock market and know how a certian event will influence the market. This analysis will help others to look at stocks they are interested in and be able to see inforamtion in a orgainzed way to help better the individual investments. Using Python we are able to explore all this data and discover model to optimize our investments.

In exploring the question of when is the right time to invest? The answer now throughout all the data and analysis of the S&P500 is following a simple investing strategy like the Simple moving average model I showed early to make a 486% increase from your orginial investment but through all the research the best time to invest is when the stock is undervalued/ at it's lowest point and you sell once it's at it's highest point. Simple right but no one can ever predict this perfectly. The best we can do is look at historical data and see what the stock has done in the past while looking at current events that helps detemine the stocks price, but never wait for the perfect moment to invest in the stock market because the analysis I did in the invidual stocks show that stocks change frequently so invest as soon as you can. In the S&P500, which is made up of the top 500 performing stocks, has always been an increasing amount since the 1980s and insteading of worrying about the right time to invest and potentially lose out on money, invest now and let the profits rake in.

Reflection¶

This assignment was quite enjoyable because it allowed me to uncover trends that I never knew existed and observe how stocks evolve over time. It was fascinating to realize the practical applications of Python in the real world and how various professionals utilize it in their daily work. For individuals interested in investing, this assignment offered valuable insights into using data to identify optimal times to buy or sell investments. While plotting the data and effectively utilizing the yfinance package presented some challenges, overall, it was a rewarding experience in working with data.

Citations¶

Data Science for Everyone, "Financial Data for Python: yfinance." Youtube, uploaded by Data Science for Everyone, 2021, https://www.youtube.com/watch?v=7wAQCwdvqqo&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV

Data Science for Everyone, "Financial Data with Python: S&P 500 Data." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=7wAQCwdvqqo&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=1

Data Science for Everyone, "Interactive Financial Plots with Plotly Express: S&P 500 Data." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=lLYi-L5ptAk&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=19

Data Science for Everyone, "Introduction to Quick Candlestick Plots with Plotly: S&P 500 Data." Youtube, uploaded by Data Science for Everyone, 2022,https://www.youtube.com/watch?v=uidT_mdBzn4&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=20

Data Science for Everyone, "Candlestick Plot with JNJ & COVID Timeline." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=oWkxWC9bc5Q&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=21

Data Science for Everyone, "Financial Data with Python." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=jpj71hltkVQ&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=22

Data Science for Everyone, "Rolling Statistics for Financial Data with Python." Youtube, uploaded by Data Science for Everyone, 2022, https://www.youtube.com/watch?v=zYNWZmqR2mI&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=23

Data Science for Everyone, "Correlation Analysis with Financial Data." Youtube, uploaded by Data Science for Everyone, 2022,https://www.youtube.com/watch?v=ulbbzPG6ZHQ&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=24

Dsilva, M. (2022, July 12). Python Stock Analysis for Beginners. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2022/06/python-stock-analysis-for-beginners/

Fervent, "Calculating Stock Returns with Python (Code-along)." Youtube, uploaded by Fervent, 2020,https://www.youtube.com/watch?v=ulbbzPG6ZHQ&list=PLlbbWgBRF8EfO4WX13yEWlDUxkHsGPRdV&index=24

Used google to help with fixing errors and explaining information when unknown

APPENDIX¶

Reviewer name: Sergio Martinez Reviwer email address: sergio.martinez.sam333@yale.edu

Summary Please write a one paragraph (~3-6 sentence) summary of the project here. You should summarize the main goals and findings of the project.

This project focuses on analyzing the S&P500 index to develop effective investment strategies. By examining adjusted returns, top stock growth rates, and basic trading tactics, this project aims to explain how to optimize buying and selling decisions in today's financial world. Through these findings, key insights into stock correlations, infrequent trading strategies, and individual stock behaviors are uncovered. Furthermore, this project provides valuable tools for informed investing which aids in the process of navigating market fluctuations. Ultimately, this project offers organized insights to enhance individual investment decisions, highlighting the significance of data-driven approaches in the financial world.

  1. Overall strengths and weaknesses Please write 1-2 paragraphs that describe overall what you think the strengths and weakness are of the project. In particular, mention what you found interesting about the results, which analyses/visualizations you found convincing, and what could be done to potentially make the project stronger.

The strengths of this project include how detailed the explanations are with regard to the code. I was able to understand what exactly David wanted me to get from his code; which allowed me to navigate his project with ease. Some weaknesses that I did notice throughout the project were that the visualizations could have benefitted from written explanations, allowing readers to gain a deeper insight into what was trying to be visualized. Furthermore, finding a way to shorten this project could also be of use, given that a more concise project might be able to get your message across better to readers. Otherwise, your project is really insightful, and I enjoyed learning about the S&P500 index :) 2. Major revisions Add bullet points of items that you think should definitely be changed for the final submission of the project.

Making your project fit within the page limit (6 to 10 pages) Putting the visualizations that didn't show up on a GitHub and including a link on your project could also be of some benefit! 3. Minor revisions Add bullet points of items that are more minor, but that would be good to change for the final submission of the project.

Consider moving some of the code throughout your project to the appendix, so you can meet the 6-10 page limit. Consider describing your data wrangling a bit more. It could be helpful for readers to know what exactly you did to get the data ready for analysis! 3. Rubric score Please write a score for the project based on the project rubric that is on Canvas. For any items where there would be a point deduction, please cut and paste a bullet point for that item in the "Items for ponts take off" section below.

Rubric items where points would be taken off if not addressed: "Did not provide a written description of the insights that the graphs provide" Total score: 88/90

Thomas Snyder Reviwer email address: thomas.snyder@yale.edu

Summary The project aims to analyze the historical trends of the S&P500 index and develop investment strategies based on these insights. Utilizing Python and the yfinance package, the analysis focuses on correlations between stocks, frequency of trading strategies, and individual stock movements. Despite lacking clarity in articulating the specific research question, the project effectively demonstrates technical proficiency in data acquisition and analysis. Findings suggest that understanding market dynamics through data analysis can inform investment decisions, though deeper insights linking analysis results to initial objectives would enhance the project's impact and clarity. package for data acquisition and analysis, showing technical proficiency. The reflection

  1. Overall strengths and weaknesses The project provides a clear introduction to the S&P500 and its importance, establishing a context for the analysis. The project demonstrates the use of Python and yfinance

section offers personal insights and reflections on the project, showcasing engagement and critical thinking. The introduction lacks clarity in articulating the specific question or hypothesis being addressed by the analysis, which is a key component according to the rubric. Data cleaning and wrangling processes are mentioned but not clearly detailed, making it difficult to assess the rigor of these steps. The conclusions could be strengthened by providing more specific insights drawn from the analysis results, as well as tying them back to the initial question or objective. The analysis of correlations

4/18/24, 2:16 PM reviewer_template

file:///Users/snydert/Downloads/reviewer_template.html 1/2

between different stocks, frequency of trading strategies, and individual stock movements provides interesting insights into market dynamics. The use of Python for data analysis and visualization is convincing, demonstrating the potential for leveraging programming tools in financial analysis. 2. Major revisions Clearly articulate the specific research question or hypothesis driving the analysis in the introduction. Provide more detailed explanations and documentation of the data cleaning and wrangling processes. Strengthen the conclusions by explicitly linking the analysis findings to the initial objectives and providing deeper insights into the implications of the results.

  1. Minor revisions Improve the organization and clarity of the project report, ensuring smooth transitions between sections. Enhance the visualization quality and clarity, ensuring that all graphs are properly labeled and visually appealing. Consider incorporating statistical analyses or additional modeling techniques to deepen the analysis and provide more robust insights. Rubric Score: 80/90 Items for Points Take Off: Introduction: Did not clearly describe what question the analysis is addressing. Data cleaning: Did not provide a clear explanation of the data cleaning process. Data visualization: Visual appearance of graphs could be improved. Analyses: Analyses do not give clear insights into the question of interest. Conclusions: Conclusions reiterate test results with no significant insight given. Total score: 80/90

Miles Kirkpatrick Reviwer email address: miles.kirkpatrick@yale.edu

Summary Please write a one paragraph (~3-6 sentence) summary of the project here. You should summarize the main goals and findings of the project. David is picking apart the stock market. In-depth and fairly methodically, this project covers not just the broad strokes of the S&P 500 but also individual stocks and company growth metrics. It's an effective primer on the stock market. potentially make the project stronger.

  1. Overall strengths and weaknesses Please write 1-2 paragraphs that describe overall what you think the strengths and weakness are of the project. In particular, mention what you found interesting about the results, which analyses/visualizations you found convincing, and what could be done to

The strentgh of this project comes from its detail. fiscal data is easy to get lost in, but this project picks out the elements that are important and focuses on them, visualizing them in a variety of ways. All of the visualizations present something unique to the project. the weakness comes from the number of visualizations and the relative lack of transitions. theres a lot of information, and at times i felt lost and unsure why we were moving in one direction or another.

4/17/24, 11:41 PM David Review

file:///Users/mileskirkpatrick/Downloads/David Review (1).html 2/2

  1. Major revisions Add bullet points of items that you think should definitely be changed for the final submission of the project. Look at other analyses more in-depth Explain what we are looking at

  2. Minor revisions Add bullet points of items that are more minor, but that would be good to change for the final submission of the project. label your axes make it clearer what the takeaways are

  3. Rubric score Please write a score for the project based on the project rubric that is on Canvas. For any items where there would be a point deduction, please cut and paste a bullet point for that item in the "Items for ponts take off" section below. INTRO 13/15 DATA CLEANING 15/15 DATA VIS 20/25 ANALYSES 20/25 CONCLUSIONS 13/15 REFLECT 5/5 TOTAL /90 Rubric items where points would be taken off if not addressed: Did not describe what other analyses have been done on the data Cut some of the visualizations, we don't need all of them, although each individually is good Explain more what we are looking at give some solid takeaways, I know you have some good insights you got from this beyond developing your methodology Total score: 76/90

In [79]:
#this is the specific S&P500 data I am pulling to show over time the growth of the index 
df = yf.download("^GSPC", start = '1980-01-01')
[*********************100%%**********************]  1 of 1 completed
In [80]:
#this is a interactive Candlestick graph which shows the open, close, high, and low of the S&P500
fig = go.Figure(data = [go.Candlestick(
    x = df.index,
    open = df['Open'],
    close = df['Close'],
    high = df['High'],
    low =df['Low']
)
])
fig.update_layout(title='Candlestick of the S&P500')
fig.show()
In [78]:
adj_close.head()
Out[78]:
Ticker A AAL AAPL ABBV ABNB ABT ACGL ACN ADBE ADI ... WTW WY WYNN XEL XOM XYL YUM ZBH ZBRA ZTS
Date
2000-01-03 43.613018 NaN 0.846127 NaN NaN 8.992847 1.277778 NaN 16.274673 28.438276 ... NaN 11.505336 NaN 6.977994 18.328699 NaN 4.680298 NaN 25.027779 NaN
2000-01-04 40.281456 NaN 0.774790 NaN NaN 8.735912 1.270833 NaN 14.909401 26.999619 ... NaN 11.073115 NaN 7.138671 17.977631 NaN 4.586222 NaN 24.666668 NaN
2000-01-05 37.782791 NaN 0.786128 NaN NaN 8.719852 1.388889 NaN 15.204173 27.393778 ... NaN 11.659698 NaN 7.414118 18.957691 NaN 4.609740 NaN 25.138889 NaN
2000-01-06 36.344158 NaN 0.718097 NaN NaN 9.024963 1.375000 NaN 15.328290 26.644884 ... NaN 12.205122 NaN 7.345260 19.937767 NaN 4.570541 NaN 23.777779 NaN
2000-01-07 39.372856 NaN 0.752113 NaN NaN 9.121320 1.451389 NaN 16.072983 27.393778 ... NaN 11.803775 NaN 7.345260 19.879253 NaN 4.468628 NaN 23.513889 NaN

5 rows × 503 columns

In [83]:
#this is an interactive candlestick chart to measure when the stock was going up and down in a bigger version 
fig.update_layout(
    title='Interactive Candlestick Chart',
    width=1200,  # Set the width of the chart
    height=800,  # Set the height of the chart
    xaxis_rangeslider_visible=False,  # Hide the range slider
    showlegend=False # Hide the legend
)
In [89]:
annotations = []
for name in important_dates:
    annotations.append(dict(x= important_dates[name], y = 1.1 , xref ='x', yref = 'paper', showarrow = False, 
         xanchor = 'left', text = name))
    
annotations
Out[89]:
[{'x': 'The Oil Crisis',
  'y': 1.1,
  'xref': 'x',
  'yref': 'paper',
  'showarrow': False,
  'xanchor': 'left',
  'text': '1982-04-29'},
 {'x': 'The Tech Bubble',
  'y': 1.1,
  'xref': 'x',
  'yref': 'paper',
  'showarrow': False,
  'xanchor': 'left',
  'text': '2000-09-11'},
 {'x': 'Financial Crisis and Great Recession',
  'y': 1.1,
  'xref': 'x',
  'yref': 'paper',
  'showarrow': False,
  'xanchor': 'left',
  'text': '2007-10-12'},
 {'x': 'Covid 19 Pandemic',
  'y': 1.1,
  'xref': 'x',
  'yref': 'paper',
  'showarrow': False,
  'xanchor': 'left',
  'text': '2020-03-20'}]
In [90]:
for i in range(len(important_dates)):
    dict(x0 = '1982-04-29', x1= '1982-04-29', y0=0, y1=1, xref = 'x', yref= 'paper', line_width = 2)
i
Out[90]:
3
In [112]:
#the dtype of the stocks 
adj_close_top9.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3004 entries, 2012-05-18 to 2024-04-26
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AAPL    3004 non-null   float64
 1   AMZN    3004 non-null   float64
 2   GOOG    3004 non-null   float64
 3   GOOGL   3004 non-null   float64
 4   META    3004 non-null   float64
 5   MSFT    3004 non-null   float64
 6   NVDA    3004 non-null   float64
 7   TSLA    3004 non-null   float64
 8   UNH     3004 non-null   float64
dtypes: float64(9)
memory usage: 234.7 KB
In [113]:
adj_close_top9.mean()
Out[113]:
Ticker
AAPL      70.546617
AMZN      77.153430
GOOG      64.503532
GOOGL     64.573072
META     165.646974
MSFT     135.464508
NVDA     104.924719
TSLA      84.535851
UNH      236.966149
dtype: float64
In [114]:
adj_close_top9.diff().head()
Out[114]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
Date
2012-05-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2012-05-21 0.934280 0.2130 0.341470 0.343093 -4.195549 0.385839 0.048158 0.080667 1.297390
2012-05-22 -0.130308 -0.1390 -0.331507 -0.333083 -3.026787 0.008049 -0.034399 0.135333 0.141376
2012-05-23 0.410896 0.0975 0.215691 0.216717 0.998940 -0.522499 0.068798 0.014667 -0.299408
2012-05-24 -0.158428 -0.1020 -0.144458 -0.145144 1.028908 -0.032160 -0.075677 -0.049333 0.715233
In [115]:
adj_close_top9.pct_change().round(3).head()
Out[115]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
Date
2012-05-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2012-05-21 0.058 0.020 0.023 0.023 -0.110 0.016 0.017 0.044 0.029
2012-05-22 -0.008 -0.013 -0.022 -0.022 -0.089 0.000 -0.012 0.071 0.003
2012-05-23 0.024 0.009 0.014 0.014 0.032 -0.022 0.025 0.007 -0.006
2012-05-24 -0.009 -0.009 -0.010 -0.010 0.032 -0.001 -0.027 -0.024 0.016
In [116]:
adj_close_top9.pct_change().mean().plot(kind = "bar", figsize = (5,5));
No description has been provided for this image
In [117]:
rets.cumsum().apply(np.exp).plot(figsize = (10,5));
No description has been provided for this image
In [118]:
# Resampling the data to every week the data will show
adj_close_top9.resample('1w', label = 'right').last().head()  
Out[118]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
Date
2012-05-20 16.036409 10.6925 14.953949 15.025025 38.189480 23.528460 2.770252 1.837333 44.901146
2012-05-27 17.001226 10.6445 14.733027 14.803053 31.876179 23.359650 2.843637 1.987333 46.672569
2012-06-03 16.961922 10.4110 14.221195 14.288789 27.690619 22.869308 2.747319 1.876667 45.774391
2012-06-10 17.546377 10.9240 14.457061 14.525776 27.071278 23.833923 2.779425 2.005333 48.236092
2012-06-17 17.359219 10.9175 14.060049 14.126877 29.978193 24.131344 2.818410 1.994000 49.165649
In [ ]:
df3 = yf.download([stock1, stock2])
In [129]:
#comparison the two stock individualy
df3['Adj Close'].plot(subplots=True, figsize = (10,6));
No description has been provided for this image
In [83]:
pd.plotting.scatter_matrix(rets,
                           alpha =.2,
                           diagonal = 'kde',
                           hist_kwds = {'bins':35},
                           figsize = (10,6));
No description has been provided for this image
In [141]:
rets.plot(subplots = True, figsize = (10,6));
No description has been provided for this image
In [23]:
#shifting the data by 1 row down
adj_close_top9.shift(1)
Out[23]:
Ticker AAPL AMZN GOOG GOOGL META MSFT NVDA TSLA UNH
Date
2012-05-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2012-05-21 16.036409 10.692500 14.953949 15.025025 38.189480 23.528463 2.770253 1.837333 44.901150
2012-05-22 16.970686 10.905500 15.295419 15.368118 33.993931 23.914299 2.818410 1.918000 46.198544
2012-05-23 16.840378 10.766500 14.963912 15.035035 30.967144 23.922340 2.784012 2.053333 46.339912
2012-05-24 17.251286 10.864000 15.179603 15.251752 31.966084 23.399841 2.852809 2.068000 46.040520
... ... ... ... ... ... ... ... ... ...
2024-04-25 169.020004 176.589996 161.100006 159.130005 493.500000 409.059998 796.770020 162.130005 487.299988
2024-04-26 169.889999 173.669998 157.949997 156.000000 441.380005 399.040009 826.320007 170.179993 493.859985
2024-04-29 169.300003 179.619995 173.690002 171.949997 443.290009 406.320007 877.349976 168.289993 495.350006
2024-04-30 173.500000 180.960007 167.899994 166.149994 432.619995 402.250000 877.570007 194.050003 489.029999
2024-05-01 170.330002 175.000000 164.639999 162.779999 430.170013 389.329987 864.020020 183.279999 483.700012

3007 rows × 9 columns

In [63]:
df_small = df.loc['2010':'2019']
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: